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I.  INTRODUCTION 


The  random  occurrence  of  computer  down-time  can  cause  delays 
in  completing  computer-dependent  tasks  if  not  accounted  for  in  initial 
job  duration  estimates.  But  how  can  a phenomenon  like  computer  failure 
be  estimated  when  it  evolves  through  time  in  a manner  that  is  not  com- 
pletely predictable?  The  answer  to  this  question  has  been  obtained  by 
applying  random  process  modeling  techniques  to  the  study  of  a large 
scale  scientific  computer's*  random  down-times.  The  models  developed 
in  this  study  provide  the  decision  maker  a quantitative  method  of  pre- 
dicting computer  availability  for  scheduling  purposes.  They  can  give 
answers  in  terms  of  probabilities  to  such  questions  as: 

a)  If  the  computer  is  down  (up)  now,  what  is  the  probability 
of  it  being  up  (down)  after  a specified  amount  of  time  has  elasped? 

b)  What  is  the  probability  of  experiencing  N failures  (N  = 0, 
1,  2...)  within  a specified  time  interval? 

c)  If  the  system  has  N failures  today,  what  is  the  probabil- 
ity of  having  M failures  tomorrow?  (M,  N = 0,  1,  2,  ...). 

The  study  was  performed  by  deriving  probability  distributions  from 
actual  computer  down-time  data  and  then  using  expected  values  of  these 
distributions  as  inputs  to  stochastic  models  of  the  computer's  random 
up/down  cycle.  Two  categories  of  stochastic  processes  were  used: 

a)  Discrete  time  processes  : 

Discrete  parameter  Markov  chain  {X  (t),  t=0,  1,  2,  ..., 

n 

n=0,  1,  2,  ...} 

b)  Continuous  time  processes: 

1)  Poisson  process  (N(t),  t >_  0} 

2)  Discrete  parameter  Markov  process  {X  (t),  t >_  0, 

n - 0,  1,  2,...}  . n 

These  processes  and  their  applications  are  discussed  in  Section  III. 

Six  months  of  chronological  data  on  computer  down-time  were 
obtained  through  the  cooperation  of  the  Directorate  for  Management 
Information  Systems,  US  Army  Missile  Readiness  Command  and  are 
included  in  Appendix  A. 


* Scientific  Computer  User's  Guide,  US  Army  Missile  Command,  October 
1974. 
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II.  DATA  ANALYSIS 


v 


A.  General  Assumptions 

For  this  study,  the  data  obtained  through  the  Directorate 
for  Management  Information  Systems  were  not  in  the  proper  form  for 
immediate  incorporation  into  the  stochastic  models  developed  and  appli- 
cable to  the  stated  problem.  The  raw  data,  presented  in  Appendix  A,  had 
to  be  interpreted,  modified,  analyzed,  and  developed  into  usable  data. 

In  order  to  perform  the  required  data  transformation,  it  was  necessary 
to  make  four  major  assumptions  as  follows: 

a)  The  first  assumption  concerned  the  third  shift  oper- 
ation of  the  computer.  During  the  third  shift,  computer  preventive 
maintenance  and  the  lack  of  a sense  of  urgency  in  repairing  computer 
failures  result  in  interrupt  and  failure  times  which  are  not  consistent 
with  the  first  and  second  shift  operations.  Because  of  this  situation, 
the  data  transformation  was  carried  out  using  only  the  data  from  the 
first  and  second  shift  operation.  These  shifts  covered  the  time  from 
7:45  a.m.  to  midnight. 

b)  The  second  assumption  was  the  combining  of  related 
inputs  into  one  failure.  From  past  experience  and  from  subsequent  data 
analysis,  multiple  interrupts  can  actually  be  the  manifestations  of  one 
basic  problem  or  failure.  Therefore,  it  interrupts  occur  within  2 hours 
of  each  other  and  have  the  same  listed  cause,  these  interrupts  are  com- 
bined into  one  unit  which  is  called  a failure. 

c)  Building  on  the  definition  of  failure,  the  third 
assumption  addresses  a variable  named  operating  time.  The  operating 
time  will  be  defined  as  the  failure  interarrival  time.  This  time  will 
not  include  the  time  between  interrupts  which  are  assumed  to  be  manifes- 
tations of  one  failure.  This  is  also  a reasonable  assumption  from  the 
point  of  view  of  operating  time  being  usable  time.  If  the  computer  is 
up  between  interrupts  but  the  time  up  is  short,  this  time  is  essentially 
not  usable  unless  a short  job  is  processed  and  the  system  is  entered 
soon  after  the  machine  comes  up. 

d)  The  fourth  and  final  major  assumption  deals  with 
the  repair  time.  Following  the  same  logic  used  in  developing  the  fail- 
ure and  operating  time  assumptions,  the  repair  time  is  assumed  to 
include  the  total  time  necessary  to  correct  a given  failure.  This 
repair  time  will  include  the  down-time  of  related  interrupts  and  the 
"unusable"  up-time  between  related  interrupts.  The  repair  time  will  be 
the  total  elapsed  time  from  first  interrupt  until  the  failure  is 
repaired. 

The  following  is  a summary  of  the  four  major  assumptions  (defini- 
tions): Consider  first  and  second  shift  operation  only;  failure  is  a 
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combination  of  related  interrupts;  operating  time  is  the  failure  inter- 
arrival time;  repair  time  is  the  total  time  to  correct  a failure. 

B.  Basic  Data  Transformation 

Using  the  stated  assumptions  and  a small  programmable 
calculator,  a basic  data  transformation  was  accomplished.  Table  1 is 
an  example  of  this  data  transformation.  The  raw  data  are  in  terms  of 
interrupts:  the  time  the  interrupt  started,  the  time  the  interrupt  was 

corrected,  the  total  time  of  the  interrupt,  and  the  cause  of  the  inter- 
rupt. The  transformed  data  are  in  terms  useful  for  developing  the  nec- 
essary parameters  and  relationships  to  be  used  in  modeling  the  computer 
availability  as  a stochastic  process.  The  main  points  to  be  noted  in 
Table  1 are  the  ramifications  of  combining  related  interrupts  into  fail- 
ure phenomenon.  In  so  doing,  the  up-time  noted  by  the  * is  not  used  in 
determining  computer  operating  time,  but  it  is  added  to  the  down-time 
to  get  the  total  time  for  repair.  Also,  by  combining  two  software  inter- 
rupts, s,  into  one  software  failure,  the  number  of  interrupts  on  that 
particular  day  is  one  less  than  the  actual  number  of  interrupts.  The 
transformed  data  are  included  in  Appendix  B.  It  is  from  these  trans- 
formed data  that  the  estimates  of  the  distribution  parameters  were  cal- 
culated . 


C.  Distribution  of  the  Variables 

Before  the  estimates  of  the  distribution  parameters 
could  be  used  in  the  models,  it  was  necessary  to  provide  some  assurances 
that  the  assumed  distribution  of  the  variables  is  justified.  It  is  not 
possible  to  devise  a statistical  test  to  be  certain  of  conformance  to 
an  assumed  distribution;  however,  a chi-square  goodness  of  fit  test  can 
be  performed  with  the  available  data  to  test  the  null  hypothesis  that 
the  data  fit  a particular  distribution.  As  a refresher.  Table  2 is  an 
outline  of  the  chi-square  goodness  of  fit  test.  This  test  was  used 
extensively  in  the  data  analysis  not  only  to  verify  the  use  of  a parti- 
cular distribution,  but  also  to  support  some  of  the  major  assumptions 
concerning  failure,  operating  time,  and  repair  time.  This  technique  is 
used  on  most  of  the  remaining  data  analyses  covered  in  this  section. 

D.  Chi-Square  Test 

Table  3 has  several  pieces  of  information.  The  chi- 
square  test  indicates  the  rejection  of  the  hypothesis  that  the  inter- 
rupts follow  a Poisson  distribution  and  the  test  indicates  the  failure 
to  reject  the  hypothesis  that  the  computer  failures  follow  a Poisson 
distribution.  The  number  of  days  on  which  observations  were  made  and 
the  average  number  of  failures  per  day  are  also  listed.  Table  4 con- 
tains the  same  basic  failure  data  but  they  are  expressed  in  a different 
format.  Table  4 shows  the  transition  matrix  for  failures  per  day. 

Although  128  days  of  observation  went  into  the  development  of  this  matrix, 
numerous  zeros  are  in  the  lower  right  hand  comer  of  the  matrix. 
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TABLE  1.  DATA  TRANSFORMATION  EXAMPLE 
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TABLE  3.  INTERRUPTS  AND  FAILURES 


TABLE  A.  TRANSITION  MATRIX 
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These  zeros  do  not  indicate  an  impossible  solution;  they  indicate  a sit- 
uation which  is  not  very  likely  to  occur  and  needs  more  data  points  to 
define  sufficiently. 

E.  Up-Time/ Ope rating  Time  Analysis 

Table  5 shows  the  up-time/operating  time  analysis.  As 
in  the  interrupt/ failure  analysis,  the  basic  assumptions  are  supported. 
The  up-times  do  not  conform  to  an  exponential  distribution;  the  opera- 
ting times  are  exponentially  distributed. 

F.  Down-Time/Repair  Time  Analysis 

Proceeding  to  the  down-tine  /repair  time  analysis,  Table 
6,  the  goodness  of  fit  tests  indicate  that  the  assumed  exponential  char- 
acter of  these  times  can  both  be  rejected.  These  results  in  themselves 
are  almost  catastrophic  because  application  of  the  repair  time  data  to 
the  chosen  random  process  model  (paragraph  III  B)  depends  on  an  expo- 
nentially distributed  repair  time.  Fortunately,  a more  thorough  analy- 
sis of  ti'.e  data  uncovers  the  bifurcated  nature  of  the  repair  time  dis- 
tribution. The  repair  time  can  be  divided  into  two  distributions;  the 
dividing  point  is  1 hour.  Making  this  distinction  there  is  a failure 
to  reject  the  null  hypotheses  that  the  two  distributions  are  exponen- 
tial, Table  7.  Apparently  there  are  two  distinct  classes  of  failure. 

One  class  of  failure  is  easily  and  quickly  repaired;  the  other  class 
r equ  r e i Other  expert  service  or  a basic  machine  restart  time. 

G.  Data  Analysis  Summary 

Table  8 is  a data  analysis  summary.  This  summary  con- 
tains the  four  major  assumptions  (definitions),  the  assumed  distribu- 
tion for  the  three  variables  (failures,  operating  time,  repair  time), 


TABLE  5.  UP-TIMES  AND  OPERATING  TIMES 
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TABLE  6.  DOWN-TIMES  AND  REPAIR  TIMES 


TABLE  7.  REPAIR  TIMES/TWO  DISTRIBUTIONS 
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and  the  estimates  of  the  parameters  of  the  assumed  distributions.  The 
data  in  Table  8 are  the  bases  of  the  applications  discussed  in  the  next 
section. 


III.  RANDOM  PROCESS  MODELS  AND  THEIR  APPLICATIONS 

A.  Discrete  Time  Processes 

The  computer  is  assumed  to  be  observed  at  a discrete  set 
of  times.  The  successive  observations  are  denoted  by  Xq,  X^, 

X where  the  X are  assumed  to  be  random  variables.  The  sequence 

n*  n 

(X  } is  a Markov  chain  if  each  random  variable  X is  discrete  and 
n n 

P [X  , |X,  . , , X 0 X . ] = 

1 n = jn  1 n-1  = jn-1’  n-2  = jn-2’  o = joJ 


P [X„ 


jn 


Xn-1  = jn-l] 


This  is  intuitively  interpeted  as:  Given  the  "present"  of  the  process, 

the  "future"  is  independent  of  its  ’’past.”  In  addition,  the  process  is 
assumed  to  be  stationary  and  it  is  sufficient  to  specify  the  one-step 
transition  probabilities: 


X 

o 


i 


] 


because  the  one-step  transition  probabilities  at  any  step  number  are 
the  same.  The  square  matrix  P whose  elements  are  the  P 's  called 


the  one-step  transition  matrix  of  a discrete  parameter  Markov  chain. 

A five-state  transition  matrix  was  developed  (Table  4)  using  the  number 
of  computer  failures  as  states  and  a time  interval  of  one  day  for  a 
transition.  The  information  contained  in  this  transition  matrix  can 
be  used  to  determine  the  probability  of  being  in  any  state  given  a start- 
ing state  after  n days  have  elapsed.  This  can  be  done  in  two  ways.  One 

technique  is  to  raise  the  matrix  P to  the  nC^  power.  This  operation 
would  be  necessary  for  every  n days  of  interest.  Another  technique  is 
the  use  of  the  Z transform  to  obtain  a matrix  whose  elements  are  functions 
of  n.  This  operation  would  result  in  fairly  easily  evaluated  probabili- 
ties for  any  n,  but  the  calculation  of  the  Z transform  solution  involves 
evaluation  of  a 5 x 5 matrix. 


Another  approach  to  extracting  information  from  this  five-state 
transition  matrix  is  to  determine  the  steady-state  or  equilibrium  prob- 
abilities. These  steady-state  probabilities  are  independent  of  the 
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initial  state;  therefore,  the  solution  to  the  set  of  simultaneous  equa- 
tions will  result  in  the  steady-state  probabilities.  This  set  of  equa- 
tions is: 

n is  a vector 
P is  a transition  matrix 
it  are  components  of  n 

Solving  this  set  of  equations  by  requiring  = 1 to  be  used  in  the 
solution  results  in  the  steady-state  probabilities, 

n = (0.469  0.329  0.131  0.54  0.013) 

To  give  some  physical  interpretation  to  these  probabilities:  if  the 

computer  operation  was  observed  for  100  days,  then  the  following  situa- 
tion is  predicted: 

No.  of  days  with  no  failures:  47 

No.  of  days  with  one  failure:  33 

No.  of  days  with  two  failures:  13 

No.  of  days  with  three  failures:  5 

No.  of  days  with  four  failures:  1 

B.  Continuous  Time  Processes 
* 

Phillips  describes  continuous  time  stochastic  processes 
as  being  similar  in  most  respects  to  discrete  time  stochastic  processes. 
He  cautions  that  additional  complexities  can  occur  due  to  each  infini- 
tesimal instant  being  available  for  a possible  transition. 

1.  The  Poisson  Process 

The  occurrence  of  computer  breakdowns  can  be  de- 
scribed by  a counting  function  (N(t),  t 1 0}  which  represents  the  num- 
ber of  breakdowns  that  have  occurred  during  the  time  period  from  0 to 
t.  According  to  Parzen,** this  counting  process  is  said  to  be  a Poisson 
process  with  mean  rate  X,  if  the  following  assumptions  are  fulfilled: 

(i)  (N(t),  t _>  0}  has  stationary  independent  increments. 

(ii)  The  number  of  counts  in  a specified  interval,  say  t-s, 
(0<s<t),  is  Poisson  distributed  such  that 


IIP 


An  “ 1 
i i 


* Phillips,  D.  Ravindran,  A.,  and  Solberg,  J.,  Operations  Research: 
Principles  and  Practice,  New  York:  John  Wiley  and  Sons,  Inc., 

1976. 

**  Parzen,  E. , Stochastic  Processes,  New  York:  Holden-Day,  Inc.,  1962. 
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P[N(t)-N(s)  = k]  = 


e->(t-s){  (t-s)}R 
k! 


0 < s < t 

K = 0,  1,  2,  ... 


0,  otherwise 

The  mean  rate  A is  interpreted  to  be  the  rate  of  arrival  breakdowns. 

The  parameter  for  this  process,  A(t-s),is  a function  of  time  and  increases 
linearly  with  the  time  interval.  An  example  of  the  Poisson  model  to  pre- 
dict computer  failures  for  a specified  time  interval  is  provided  in  Fig- 
ure 1.  Poisson  distribution  graphs  for  two  time  intervals  are  illustrated. 
The  probability  of  the  occurrence  of  zero  failures  decreases  as  the  time 
interval  increases. 

TABLE  8.  DATA  ANALYSIS  SUMMARY 
(First  and  Second  Shift  Operation) 

Failures 

Combination  of  related  interrupts 
(Same  cause  within  two  hours) 

Poisson  distribution,  u * 0.820  failures/day 

Or  crating  Time 

Failure  interarrival  time 

Exponential  distribution,  A = 0.0532  failures/hour 
Repair  Time 

Total  time  to  correct  one  failure 
(include  nonusable  up-time) 

Bifurcated  Distribution 

Time  < 1 hour  - Exponential,  A = 5.171  repairs/hour 
Time  < 1 hour  - Exponential,  A - 0.465  repairs/hour 
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2. 


The  Discrete  Parameter  Markov  Process 


The  computer's  random  up-time/down-time  events  occur 
as  illustrated  in  Figure  2.  The  system  can  be  in  one  of  two  states,  "on" 
or  "off."  If  it  is  "on,"  it  operates  for  a random  time  before  breakdown. 
If  it  is  "off,"  it  is  off  for  a random  time  before  being  repaired.  This 
phenomenon  can  be  modeled  as  a two-valued  stochastic  process  termed  a 
discrete  parameter  Markov  process.  It  has  discrete  parameters  but  a con- 
tinuous state  space  (time).  The  assumptions  for  the  continuous  time 
Markov  are: 


(i)  The  process  satisfies  the  Markov  property. 

(ii)  The  process  is  stationary. 

(iii)  The  probability  of  transition  from  one  state  to  another  in  a 
short  time  interval  At  is  proportional  to  At. 


(iv)  The  probability  of  two  or  more  changes  of  state  in  a short 
interval  At  is  zero. 


The  transition  probabilities 
tions.  The  subscripts  ^ and 
Then,  from  assumption  (iii). 


Pij  ^ Can  derived  based  on  these  assump- 
o denote  "up"  and  "down"  respectively. 


P [repair  transition  in  At]  = = vAt 


(1) 


P [breakdown  transition  in  At]  = P (At)  = AAt  (2) 

0 

The  function  p^  (t  + At)  or  the  probability  that  the  computer  is  in 

State  1 (up)  at  time  t + At  is  now  considered,  given  that  it  was  in  State 
0 (being  repaired)  at  time  zero.  Either  the  computer  was  being  repaired 
at  time  t and  was  put  into  operation  during  the  interval  At,  or  it  was 
operating  at  time  t and  continued  to  operate  for  the  short  interval  At. 

In  equation  form, 

PQ1  (t  + At)  - PQ0(t)  PQ1(At)  + PQ1(t)  Pn  (At)  . (3) 
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tochastic  Poisson  process 


This  is  a special  case  of  the  Chapman-Kolmc  [orov  equations  for  the  con- 
tinous  time  case.  Assumption  (i)  is  requi: id  to  permit  multiplication 
of  the  probabilities  referring  to  events  dt.iing  t and  to  events  during 
At.  Assumption  (ii)  is  required  to  permit  use  of  the  same  probability 
functions  for  the  interval  t and  for  the  liter  interval  At.  Equations 
(1)  and  (2)  are  substituted  into  Equation  3)  and  manipulated  to  obtain 
the  difference  equation 


p0l(t  + At)  - PQ1(t) 
At 


Apoo(t)  -”P01<,;) 


(4) 


Taking  the  limit  of  both  sides  of  Equation  (4)  as  At  approaches  zero 
results  in  the  differential  equation 


dp01(t) 

— — = *P  (t)  - vp  (t)  . (5) 

dt  00  U1 


The  other  three  transition  functions  can  be  derived  similarly: 


dp00  ^ 
dt 


-*p00(O  + vpQ1(t) 


(6) 


dpio(t) 

dt 


-*P1Q(t)  + vpn(t) 


(7) 


dpn(t) 

— XP10(t)  " vpll(t)  • (8) 


Equations  (5)  through  (8)  are  a system  of  linear  first-order  dif- 
ferential equations  with  constant  coefficients.  They  can  be  solved 
directly  using  Laplace  transforms  and  the  initial  conditions 


Pij(0)  - 


1.  i ■ J 

0,  otherwise 
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The  solutions  are 


• (X+v) t 


poo(t)  ■ <*  + ve 


poi(t)  ' TTTTo  w - '~<‘+',>) 


1 ,,  -(X+v)tN 

pio(t)  ■ -fTT^T  (1  ■ ' > 


■’ll''*  ' wh)  <”  + “>"<i+U)t) 


In  matrix  form: 


P00<t>  PM(t) 


P10(t>  Pll<‘> 


1 

X + v 


- (X+v) t -(X+v)t 

A + ve  v - ve 


X - Xe 


■(>+v>tv  + 


The  sum  of  the  terms  in  each  row  is  1 as  they  should  be  in  order  to  repre- 
sent Markov  probabilities.  Taking  the  limit  of  — . . as  t approaches 
infinity. 


X 

X + v 


v 

X + v 


X 

A + v 


v 

X+v 


the  steady-state  probabilities  are  obtained. 

This  model  is  now  ready  to  be  exercised  using  the  expected  values 
for  operating  times  and  repair  times  obtained  in  Paragraph  2.  The  value 
for  A is  1/18.8  hours  but  two  values  for  v must  be  considered  due  to  the 
bifurcated  repair  time  distribution. 


CASE  I:  Computer  Repair  Time  1 hour 

Number  of  Failures  — 1 Hour 
p [CASE  Ij  Total  Number  of  Failures 


mean  repair  time  for  Case  I = 0.1934  hour 
v = 1/0.1934  = 5.17/hour 
X = 1/18.8  = 0.0532/hour 


CASE  II:  Computer  Repair  Time  > 1 hour 

p [CASE  II]  = 1 - p [CASE  I]  = 1/3 
mean  repair  time  for  CASE  II  = 2.149  hours 
v2  = 1/2.149  = 0.4653/hour 
X = 0.0532/hour 

■p  /t)  = 1 

’ 0.5185 

Plots  of  the  four  transition  probabilities  as  a function  of  time 
for  Cases  I and  II  are  provided  in  Figures  3 through  6.  The  combined 
Case  I and  II  probabilities  are  also  shown  for  each  transition  state. 
The  combined  functions  were  obtained  by  using  the  following  equation: 


(0.0532  + 0.4653e"0,5185t)  0.4653  (1  - e~°‘ 5185t)l 


.0.0532  (1  - e-0-5185,:)  (0.4653  + 0. 532e"0' 5185t) 


PI+II(t)  = 3 PI(t)  + 3 PII(t) 


The  combined  transition  matrix  would  be  used  unless  a conditional  prob- 
ability situation  occurs.  For  example,  if  the  computer  is  down  now  and 
has  been  down  continuously  for  over  1 hour,  the  Case  II  transition  curves 
could  be  used  directly  to  determine  state  probabilities  Pgg(t)  and 

and  Pg^(t).  However,  if  the  system  is  up  now  and  the  probability  of  it 

staying  up  for  a specified  time  interval  is  wanted,  the  combined  probabil- 
ity curve  in  Figure  6 should  be  used  because  conditional  probability 
information  is  not  available. 


1 
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CASE  I (DOWNTIME  * 1 hour) 


t 


CASE  II  (DOWNTIME  > 1 hour) 


t 


Figure  3.  Probability  of  remaining  "down." 
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CASE  I (DOWNTIME  <S  1 hour) 


CASES  I AND  II  COMBINED 


Figure  6.  Probability  of  remaining  "up." 
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IV.  CONCLUSIONS 

Random  process  modeling  techniques  provide  a quantitative 
means  for  assessing  computer  availability.  They  can  provide  answers 
to  scheduling  questions  in  terms  of  probabilities.  The  techniques  and 
methods  of  analysis  presented  here  should  be  applicable  to  any  large 
scale  computer  system. 

Although  failures  occur,  the  computer  has  a high  availability 
factor.  This  factor,  rho,  is  obtained  from  the  steady  state  matrix 
derived  in  Paragraph  III  B.  With  combined  probabilities  accounted  for: 


The  computer  availability  probabilities  will  remain  valid  unless 
major  system  hardware  and  software  changes  are  made  (violating  the 
Markovian  assumptions). 


Appendix  A.  CHRONOLOGICAL  DATA  ON  COMPUTER  DOWN  TIME 
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CDC-6600  DOWN  TIME  CHRONOLOGY  FOR  26  MAY  - 25  JUNE  1976 
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CDC-6600  DOWN  TIME  CHRONOLOGY  FOR  26  JUNE  - JULY  1976 
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TOTAL  HARDWARE  INTERRUPTIONS  - 20  TOTAL  SOFTWARE  INTERRKUPTIONS  » 1 TOTAL  OTHER  INTERRUPTIONS 
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TOTAL  HARDWARE  DOWN  TIME  = 1399  TOTAL  SOFTWARE  DOWN  TIME  = 42  TOTAL  OTHER  DOWN  TIME  = 375 
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Date 

Time 

Up 

Down 

5.30 

- 

0.08 

2.42 

0.25 

28  May 

11.90 

1.35 

1 Jun 

12.55 

0.63 

3.03 

4.28 

2 Jun 

9.88 

0.15 

3 Jun 

- 

- 

4 Jun 

- 

- 

7 Jun 

- 

- 

8 Jun 

64.85 

0.17 

9 Jun 

22.83 

0.17 

10  Jun 

12.78 

5.30 

11  Jun 

2.12 

6.72 

0.68* 

0.10 

14  Jun 

20.33 

0.17 

15  Jun 

4.50 

0.28 

8.63 

0.18 

16  Jun 

- 

- 

17  Jun 

29.88 

0.13 

1.30 

0.37 

0.13 

18  Jun 

- 

- 

21  Jun 

- 

- 

22  Jun 

- 

- 

23  Jun 

59.08 

0.52 

24  Jun 

- 

- 

25  Jun 

37.73 

0.20 

28  Jun 

14.83 

0.38 

29  Jun 

9.02 

0.17 

2.53 

0.35 

30  Jun 

- 

- 

1 Jun 

- 

0.17 

- 

0.02 

- 

0.23 

29.97 

0.67 

2 Jul 

18.25 

0.42 

0 

0.17 

- 

0.17 

6 Jul 

- 

- 

7 Jul 

- 

0.19 

- 

0.25 

Interrupts 


No.  of 


*These  up-times  will  be  deleted  to  form  the  operat- 
ing times. 
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Date 

Time 

Interrupts 

No . of 
Failures 

Up 

Down 

Repair 

No . 

Type 

36.03 

0.28 

3 

H 

1 

0.27* 

0.05 

1 R'X 

H 

1.05* 

0.18 

H 

8 Jul 

10.05 

1.28 

0 

6.05 

0.17 

3 

S 

3 

4.25 

0.37 

H 

9 Jul 

- 

- 

0 

- 

0 

12  Jul 

18.35 

0.72 

H 

4.07 

0.13 

1 /, /. 

3 

H 

2 

0.28* 

1.03 

H 

13  Jul 

- 

- 

0 

- 

0 

14  Jul 

26.62 

0.40 

1 

H 

1 

15  Jul 

17.95 

0.17 

H 

2.13 

0.12 

3 

H 

2 

0.58* 

0.62 

1 . JZ 

H 

16  Jul 

- 

- 

0 

- 

0 

19  Jul 

- 

- 

0 

- 

0 

20  Jul 

- 

- 

0 

- 

0 

21  Jul 

67.22 

1.25 

1 

H 

1 

22  Jul 

15.75 

0.10 

1 

0 

1 

23  Jul 

- 

- 

0 

- 

0 

26  Jul 

30.23 

0.20 

1 

0 

1 

27  Jul 

- 

- 

0 

- 

0 

28  Jul 

- 

- 

0 

- 

0 

29  Jul 

- 

- 

0 

- 

0 

30  Jul 

65.22 

0.17 

1 

0 

1 

2 Aug 

12.27 

2.67 

2 

H 

2 

3.63 

0.12 

H 

3 Aug 

- 

0.17 

0 

S 

0 

4 Aug 

25.32 

0.48 

'i  nA 

H 

1.23* 

1.33 

3 

H 

2 

1.88 

0.27 

S 

5 Aug 

- 

- 

0 

- 

0 

6 Aug 

35.38 

0.25 

1 

H 

1 

9 Aug 

6.50 

0.15 

2 

H 

2 

7.92 

0.37 

0 

10  Aug 

- 

- 

0 

- 

0 

11  Aug 

- 

- 

0 

- 

0 

12  Aug 

- 

- 

0 

- 

0 

13  Aug 

- 

- 

0 

- 

0 

16  Aug 

85.57 

12.35** 

• - 

1 

H 

1 

17  Aug 

1.95 

0.27 

1 

S 

1 

*These  up-times  will  be  deleted  to  form  the  operating 
times . 

**12.35  hours  from  2100  hours  16  Aug  to  0935  hours  17  Aug. 
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Date 

Time 

Interrupts 

No . of 

Fai lures 

Up 

Down 

Repair 

No . 

Type 

18  Aug 

20.20 

0.28 

n 

0 

1 

19  Aug 

7.95 

1.68 

mm 

11 

5.60 

0.17 

O 

3 

1.05 

3.58 

H 

20  Aug 

- 

- 

0 

- 

0 

23  Aug 

21.93 

0.32 

1 

0 

l 

24  Aug 

22.75 

4.45 

1 

0 

1 

25  Aug 

5.92 

0.30 

2 

0 

2 

25  Aug 

3.13 

0.70 

H 

26  Aug 

- 

— 

0 

- 

0 

27  Aug 

- 

- 

0 

- 

0 

30  Aug 

- 

- 

0 

- 

0 

31  Aug 

59.38 

0.12 

0 

0 

0.33* 

0.32 

4 

H 

2 

0 . k8* 

1.00 

0 

0.63* 

1.82 

1*11 

H 

1 Sep 

- 

- 

0 

- 

0 

2 Sep 

33.38 

0.13 

1 

0 

1 

3 Sep 

- 

- 

0 

- 

0 

7 Sep 

26.12 

12.42 

1 

0 

1 

8 Sep 

6.30 

0.28 

2 

0 

2 

6.20 

0.42 

s 

9 Sep 

- 

- 

0 

- 

0 

10  Sep 

- 

- 

0 

- 

0 

13  Sep 

- 

- 

0 

“ 

0 

14  Sep 

56.98 

0.37 

1 

H 

1 

15  Sep 

- 

- 

0 

- 

0 

16  Sep 

33.12 

0.25 

H 

6.28 

0.17 

■ 

S 

3 

1.05 

0.67 

0 

17  Sep 

16.25 

0.28 

1 (\fs. 

. 

0 

1.38 

1.00 

4 • v)v) 

0 

20  Sep 

2.83 

0.75 

H 

0.65 

0.05 

0 

2 

0.72* 

0.28 

0 

0.22* 

0.55 

0 

21  Sep 

- 

- 

0 

- 

0 

22  Sep 

- 

- 

0 

- 

0 

23  Sep 

- 

- 

0 

- 

0 

24  Sep 

-- 

- 

0 

- 

0 

27  Sep 

78.78 

0.83 

2.09 

H 

1 

0.83* 

0.43 

H 

*These  up-times  will  be  deleted  to  form  the  operating 
times . 
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Time 

Interrupts 

No.  of 
Failures 

Date 

Up 

Down 

Repair 

No. 

Type 

28  Sep 

20.98 

0.23 

2.09 

1 

0 

1 1 

29  Sep 

- 

- 

9 ftO 

0 

- 

0 

30  Sep 

24.68 

4.55** 

1 

O 

1 

1 Oct 

- 

0 

- 

0 

4 Oct 

- 

- 

0 

- 

0 

5 Oct 

- 

- 

0 

0 

6 Oct 

- 

- 

0 

- 

0 

7 Oct 

- 

- 

0 

- 

0 

8 Oct 

96.53 

0.15 

2 

0 

0.15* 

0.20 

0.50 

0 

12  Oct 

14.03 

0.30 

H 

1 

1.45* 

0.12 

I • O / 

H 

PHbH 

5.13 

0.12 

5 

H 

0.17* 

0.10 

U • Jy 

H 

0.15 

0.35 

O 

13  Oct 

10.02 

0.53 

2 

H 

0.15* 

0.67 

1 • jj 

H 

14  Oct 

- 

0.42 

0 

O 

0 

13  Oct 

29.47 

0.73 

O 

1.23* 

0.37 

Z • J J 

O 

2.15 

0.17 

5 

O 

3 

0.17* 

0.25 

U .J7 

0 

2.78 

0.23 

S 

18  Oct 

- 

- 

0 

- 

0 

19  Oct 

30.25 

0.23 

n 

s 

1 

20  Oct 

10.48 

0.13 

SK 

H 

0.15* 

1.00 

1 • 40 

II 

H 

2 

1.78 

0.35 

O 

1.53* 

0.05 

1 .7J 

1 

0 

21  Oct 

- 

- 

| 

- 

0 

22  Oct 

40.87 

10.23** 

2.50 

n 

H 

1 

26  Oct 

25.07 

0.27 

O 

1 

27  Oct 

- 

- 

0 

- 

0 

28  Oct 

26.67 

0.15 

1 

0 

1 

- 

0.42 

0 

29  Oct 

18.38 

1.12 

2 

H 

2 

2.20 

0.15 

0 

30  Oct 

0 

0 

*These  up-times  will  be  deleted  to  form  the  operat- 
ing times. 

**1023  hours  from  2130  hours  22  Oct  to  0744  hours 
23  Oct. 
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*8.08  hours  from  1945  hours  19  Nov  to  351  hours 
20  Nov. 

**These  up-times  will  be  deleted  to  form  the  operating 
times . 
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